Adjacency and incidence matrix provide relationship between several nodes. The information they contain can have different nature, thus this document will consider several examples:
directed and weighted. Like the number of people migrating from one country to another. Data used comes from this scientific publication from Gui J. Abel.# Libraries
library(tidyverse)
library(hrbrthemes)
library(circlize)
library(kableExtra)
options(knitr.table.format = "html")
library(viridis)
library(igraph)
library(ggraph)
# Load dataset from github
#data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
data <- read.table("../Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# show data
data %>% head(3) %>% select(1:3) %>% kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F)| Africa | East.Asia | Europe | |
|---|---|---|---|
| Africa | 3.142471 | 0.000000 | 2.107883 |
| East Asia | 0.000000 | 1.630997 | 0.601265 |
| Europe | 0.000000 | 0.000000 | 2.401476 |
undirected and unweighted. I will consider all the co-authors of a researcher and study who is connected through a common publication. Data have been retrieved using the scholar package, the pipeline is describe in this github repository. The result is an adjacency matrix with about 100 researchers, filled with 1 if they have published a paper together, 0 otherwise.# Load data
#dataUU <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyUndirectedUnweighted.csv", header=TRUE)
dataUU <- read.table("../Example_dataset/13_AdjacencyUndirectedUnweighted.csv", header=TRUE)
# show data
dataUU %>% head(3) %>% select(1:4) %>% kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F)| from | A.Bateman | A.Besnard | A.Breil |
|---|---|---|---|
| A Armero | NA | NA | 1 |
| A Bateman | NA | NA | NA |
| A Besnard | NA | NA | NA |
Relationships can also be undirected and weighted
Relationships can also be directed and unweighted
Chord diagram is a good way to represent the migration flows. It works well if your data are directed and weighted like for migration flows between country.
Disclaimer: this plot is made using the circlize library, and very strongly inspired from the Migest package from Gui J. Abel, who is also the author of the migration dataset used here.
Since this kind of graphic is used to display flows, it can be applied only on network where connection are weighted. It does not work for the other example on authors connections.
# short names
colnames(data) <- c("Africa", "East Asia", "Europe", "Latin Ame.", "North Ame.", "Oceania", "South Asia", "South East Asia", "Soviet Union", "West.Asia")
rownames(data) <- colnames(data)
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname)
# parameters
circos.clear()
circos.par(start.degree = 90, gap.degree = 4, track.margin = c(-0.1, 0.1), points.overflow.warning = FALSE)
par(mar = rep(0, 4))
# color palette
mycolor <- viridis(10, alpha = 1, begin = 0, end = 1, option = "D")
mycolor <- mycolor[sample(1:10)]
# Base plot
chordDiagram(
x = data_long,
grid.col = mycolor,
transparency = 0.25,
directional = 1,
direction.type = c("arrows", "diffHeight"),
diffHeight = -0.04,
annotationTrack = "grid",
annotationTrackHeight = c(0.05, 0.1),
link.arr.type = "big.arrow",
link.sort = TRUE,
link.largest.ontop = TRUE)
# Add text and axis
circos.trackPlotRegion(
track.index = 1,
bg.border = NA,
panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
sector.index = get.cell.meta.data("sector.index")
# Add names to the sector.
circos.text(
x = mean(xlim),
y = 3.2,
labels = sector.index,
facing = "bending",
cex = 0.8
)
# Add graduation on axis
circos.axis(
h = "top",
major.at = seq(from = 0, to = xlim[2], by = ifelse(test = xlim[2]>10, yes = 2, no = 1)),
minor.ticks = 1,
major.tick.percentage = 0.5,
labels.niceFacing = FALSE)
}
) In my opinion this is a powerful way to display information. Major flows are easy to detect, like the migration from South Asia towars Westa Asia, or Africa to Europe. Moreover, for each continent it is quite easy to quantify the proportion of people leaving and arriving.
However chord diagram is not an usual way of displaying information. Thus, it is advised to give a good amount of explanation to educate your audience. A good way to do so is to draw just a few connections in a first step, before displaying the whole graphic. See this blog post by Nadieh Bremer for more ideas on this topic.
Sankey diagram is another option to display weighted connection. Intead of displaying regions on a circle, they are duplicated and represented on both side of the graphic. Origin is usually on the left, destination on the right.
# Package
library(networkD3)
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'
# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)# with ggplot2 --> not very insightful if not clusterised
dataUU %>%
gather(key="to", value="value", -1) %>%
ggplot( aes(x=to, y=from, fill=value)) +
geom_tile()tmp <- dataUU
rownames(tmp) <- tmp[,1]
tmp <- tmp[,-1]
tmp <- as.matrix(tmp)
tmp[which(is.na(tmp))] <- 0
heatmap(tmp, na.rm=TRUE)summary(tmp)## A.Bateman A.Besnard A.Breil A.Cenci
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.008475 Mean :0.01695 Mean :0.02542 Mean :0.01695
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## A.Criscuolo A.Dassouli A.Dellagi A.Dereeper
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.008475 Mean :0.01695 Mean :0.02542 Mean :0.04237
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## A.Regnault AMA.Chifolleau B.Boussau B.Martret
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.01695 Mean :0.02542 Mean :0.04237 Mean :0.02542
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## B.Noel B.Roure B.Vacherie C.Abessolo
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.000000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.000000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.000000
## Mean :0.02542 Mean :0.01695 Mean :0.0339 Mean :0.008475
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.000000
## C.Burgarella C.Douady C.Fizames C.Guilhaumon
## Min. :0.00000 Min. :0.000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.00000 Median :0.000000 Median :0.000000 Median :0.00000
## Mean :0.02542 Mean :0.008475 Mean :0.008475 Mean :0.02542
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.000000 Max. :1.000000 Max. :1.00000
## C.Scornavacca CP.Meyer CW.Kilpatrick D.Baurain
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.1102 Mean :0.01695 Mean :0.02542 Mean :0.02542
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## D.Durand D.Expert D.Pot D.Sánchez
## Min. :0.00000 Min. :0.0000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.000000 Median :0.00000
## Mean :0.04237 Mean :0.0339 Mean :0.008475 Mean :0.01695
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.000000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.0000 Max. :1.000000 Max. :1.00000
## D.Vile DMD.Vienne E.Bazin E.Douzery
## Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.000000 Median :0.000000 Median :0.00000 Median :0.0000
## Mean :0.008475 Mean :0.008475 Mean :0.01695 Mean :0.0339
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.000000 Max. :1.000000 Max. :1.00000 Max. :1.0000
## E.Figuet E.Jacox EJP.Douzery EJPDNLH.Philippe
## Min. :0.000000 Min. :0.000000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.000000 Median :0.000000 Median :0.0000 Median :0.00000
## Mean :0.008475 Mean :0.008475 Mean :0.1017 Mean :0.01695
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.000000 Max. :1.0000 Max. :1.00000
## EMA.LGI2P F.De.Lamotte F.Delsuc F.Gosti
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.01695 Mean :0.0339 Mean :0.09322 Mean :0.04237
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## F.Zhang FC.Baurens G.Droc G.Poux
## Min. :0.00000 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.000000 Median :0.00000 Median :0.00000
## Mean :0.04237 Mean :0.008475 Mean :0.05085 Mean :0.08475
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.000000 Max. :1.00000 Max. :1.00000
## G.Sarah G.Tsagkogeorga GJ.Szöllősi J.Alassimone
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.04237 Mean :0.01695 Mean :0.02542 Mean :0.0339
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.0000
## J.Dainat J.David J.Melo.Ferreira J.Montmain
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.02542 Mean :0.0339 Mean :0.02542 Mean :0.0339
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## J.Romiguier JF.Dufayard JH.Wilson JL.David
## Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.000000
## Median :0.0000 Median :0.00000 Median :0.0000 Median :0.000000
## Mean :0.0339 Mean :0.02542 Mean :0.0339 Mean :0.008475
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.000000
## JP.Doyon JT.Boehm JY.Dutheil JY.Yang
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.0000 Median :0.00000 Median :0.00000 Median :0.000000
## Mean :0.0339 Mean :0.04237 Mean :0.04237 Mean :0.008475
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.000000
## K.Belkhir K.Hadj.Kaddour KY.Gorbunov L.Duret
## Min. :0.00000 Min. :0.000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.00000 Median :0.000000 Median :0.00000 Median :0.000000
## Mean :0.07627 Mean :0.008475 Mean :0.01695 Mean :0.008475
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :1.00000 Max. :1.000000 Max. :1.00000 Max. :1.000000
## L.Marquès M.Ardisson M.Ballenghien M.Bonnefoy
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.000000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.000000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.000000
## Mean :0.01695 Mean :0.09322 Mean :0.0339 Mean :0.008475
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.000000
## M.Crampes M.Tavaud MC.SOULIE MF.Sy
## Min. :0.0000 Min. :0.000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.0000 Median :0.000000 Median :0.00000 Median :0.000000
## Mean :0.0339 Mean :0.008475 Mean :0.01695 Mean :0.008475
## 3rd Qu.:0.0000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.000000 Max. :1.00000 Max. :1.000000
## MH.Muller MK.Tilak N.Agudelo N.Auberval
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.02542 Mean :0.01695 Mean :0.0339 Mean :0.02542
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
## N.Chantret N.Galtier NO.Rode P.Augereau
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.00000
## Mean :0.05085 Mean :0.09322 Mean :0.0339 Mean :0.02542
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.00000
## P.Chevret P.Gayral P.Leroy P.Roumet
## Min. :0.000000 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000000 Median :0.000000 Median :0.00000 Median :0.00000
## Mean :0.008475 Mean :0.008475 Mean :0.02542 Mean :0.08475
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.000000 Max. :1.00000 Max. :1.00000
## PD.Jenkins PH.Fabre S.Bocs S.Diser
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.0339 Mean :0.01695 Mean :0.01695 Mean :0.0339
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.0000
## S.Gaillard S.Gautier S.Glémin S.Guillemot
## Min. :0.000000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.000000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.008475 Mean :0.04237 Mean :0.09322 Mean :0.02542
## 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.000000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## S.Harispe S.Pointet S.Pourali S.Santoni
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.01695 Mean :0.01695 Mean :0.01695 Mean :0.1102
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.0000
## SC.Mills T.Lengauer U.Jordan UK.Hinxton
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.02542 Mean :0.02542 Mean :0.01695 Mean :0.01695
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## V.Berry V.Daubin V.Dion V.Lefort
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1356 Mean :0.0339 Mean :0.02542 Mean :0.01695
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## V.Viader W.Paprotny Y.Holtz Y.Moreau
## Min. :0.00000 Min. :0.000000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.000000 Median :0.00000 Median :0.0000
## Mean :0.09322 Mean :0.008475 Mean :0.05932 Mean :0.0339
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.000000 Max. :1.00000 Max. :1.0000
Since an adjacency matrix is a network structure, it is possible to build a network graph. In my opinion, this makes more sense when the connection are unweighted, since drawing edges with different sizes tends to clutter the figure and make it unreadable.
Thus, here is an application of this chart type to the coauthor network. When to people have published at least one scientific paper together, they are connected.
# Transform the adjacency matrix in a long format
connect <- dataUU %>%
gather(key="to", value="value", -1) %>%
na.omit()
# Number of connection per person
c( as.character(connect$from), as.character(connect$to)) %>%
as.tibble() %>%
group_by(value) %>%
summarize(n=n()) -> coauth
colnames(coauth) <- c("name", "n")
# Create a graph object with igraph
mygraph <- graph_from_data_frame( connect, vertices = coauth )
plot(mygraph)# Make the graph
ggraph(mygraph, layout="nicely") +
#geom_edge_density(edge_fill="#69b3a2") +
geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
geom_node_point(aes(size=n, alpha=n)) +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0), "null"),
panel.spacing=unit(c(0,0,0,0), "null")
) Network chart is a whole field of study on its own and overtake the scope of this website from far. The biggest challenge is to position each node efficiently, to avoid crosses between edges. Several algorithms exist to tackle this issue and it is always a good practice to try several of them.
# Make the graph
ggraph(mygraph, layout="auto") +
#geom_edge_density(edge_fill="#69b3a2") +
geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
geom_node_point(aes(size=n, alpha=n)) +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0), "null"),
panel.spacing=unit(c(0,0,0,0), "null")
) ## Using `nicely` as default layout
ggraph(mygraph, layout="igraph", algorithm="kk") +
#geom_edge_density(edge_fill="#69b3a2") +
geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
geom_node_point(aes(size=n, alpha=n)) +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0), "null"),
panel.spacing=unit(c(0,0,0,0), "null")
) Instead of using a custom algorithm to position each nodes, it is possible to place them al around a circule, a bit like for a chord diagram. But this kind of chart makes sense only if the order of nodes around the circule is carefully chosen, to avoid aving a fusball.
#dataUU %>% hclust()
#help(hclust)
#dist(dataUU)
#cor(dataUU[,-1], use="complete.obs")
#summary(dataUU)
#as.dist(dataUU)
#dim(dataUU)
#hclust(d, method = "complete", members = NULL)
# Make the graph
ggraph(mygraph, layout="circle") +
#geom_edge_density(edge_fill="#69b3a2") +
geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
geom_node_point(aes(size=n, alpha=n)) +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0), "null"),
panel.spacing=unit(c(0,0,0,0), "null")
) This is basically the same concept, but where nodes are displayed all along a line
# Make the graph
ggraph(mygraph, layout="linear") +
geom_edge_arc(edge_colour="black", edge_alpha=0.2, edge_width=0.3, fold=TRUE) +
geom_node_point(aes(size=n, alpha=n)) +
geom_node_text(aes(label=name), angle=45, hjust=1) +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0.4,0), "null"),
panel.spacing=unit(c(0,0,3.4,0), "null")
) A work by Yan Holtz for data-to-viz.com